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1. Introduction 

Giirsel and Tinto pQ first noted that a system of three ground-based interferometers 
sensitive to two incoming gravitational wave polarisations over-determines the signal, 
permitting a consistency check in the absence of knowledge of the waveform. Their 
statistic was based on the failure to reject the null hypothesis that the observations 
were consistent with a gravitational wave from a postulated direction. Subsequent 
authors generalised the Giirsel-Tinto statistic to more than three interferometers and 
supplemented it with other consistency tests [2J, and proposed various alternatives 
related to it [3J HI [SJ [5] . With the partial exception of [7] , the use of Bayesian inference 
in the literature on coherent detection of unmodelled bursts has been confined to the 
justification of individual steps in the design of various statistics. 

A fully Bayesian treatment of the problem requires (and uniquely follows 
from) the explicit specification of competing models of an experiment, including an 
explicit statement of how plausible we find any particular combination of the model 
parameters. This includes working out a priori how to 'spread our bets' over the large 
dimensional space of all possible signal waveforms. Such prior plausibility distributions 
are necessarily subjective — there is no 'right' nor 'wrong' way to choose them — but 
they do make definite statements about our expectations, and it is difficult to conceive 
of a scientist who would knowingly bet, for instance, that most detected gravitational 
waves will have a strain greater than unity. 

The Bayesian requirement for a waveform prior stands in stark contrast to Giirsel- 
Tinto's advertised independence of signal waveform. The difference reflects a common 
criticism of Bayesian inference: that the use of priors introduces subjectivity and even 
bias into an analysis. One response to this criticism is to note that non-Bayesian 
statistics are not free of prior expectations about the world, but that their priors 
are merely implicit, unexamined and because of this may even contradict the stated 
intent of their designers. This suggests the question: what choice of priors will make 
a Bayesian statistic behave like the Giirsel-Tinto statistic and its relatives? 

In this paper we set up a Bayesian analysis treating the experiment as a linear 
model with unspecified multivariate normal distributions as the priors on noise and 
signal. We then demonstrate how the Bayesian statistic reduces to (or limits to) 
Giirsel-Tinto and other related methods if we choose certain priors. The results are 
astonishing: none of the methods assume a uniform signal population across the sky, 
and most assume gravitational waves are unphysically large or undetectably small and 
very frequent. These priors are not 'wrong', but they are very far from the best guess 
of the scientists that designed them. Nor are the statistics necessarily ineffective, but 
they must be less effective than an practically implementable Bayesian statistic whose 
priors better reflect the real world. 

2. Bayesian formulation 

We can formulate much of the problem as a linear system. Pack N time series, each 
of an observatory's M measurements, into a vector 

x = [ari,i,xi, 2 , • ■ ■,Xi tM ,x 2 ,i, ■ • ■ ,Xn,m] T - (1) 

We could work equally well with any linear transformation of the time series (such as 
the Fourier or wavelet transforms). 



Robust Bayesian detection of unmodelled bursts 



3 



One model for the noise in the system is that it is drawn from a multivariate 
normal distribution, described by an MN x MN covariance matrix A. The plausibility 
of x assuming the noise-only hypothesis is then 

p(x| 'noise') = \ — exp — -x T A -1 x. (2) 

' ' (2^) MJV / 2 VdelA 2 v; 

We may search for a particular signal 

h = h +t 2, ■ ■ ■ , h +t M, h x ,i, ■ ■ ■ , h x ,M] (3) 

which produces an additive response Fh, where the MN x 2M response matrix 
F includes the antenna patterns and geometrical time delay associated with 
the particular direction (9, (f>) of the signal and the physical locations of the 
interferometers. The distribution of the data expected in the presence of this signal is 

p(xlh) = 1 , exp--(x -Fh) T A-Vy Vh) (4) 

V ' ' (27r) MAr / 2 VdctA 2 V ' K 1 K ' 

This form is of no direct use because we do not know any particular signal h we want 
to search for. Bayesian marginalisation solves this problem for us but requires us to 
place a prior plausibility distribution on h: 

p(x|'signal') = J dhp(h)p(x|h). (5) 

We must specify a priori how likely we think any particular signal is to occur. If the 
marginalisation integral is to be solved symbolically, this distribution must take the 
form of a multivariate normal distribution. However, some desirable signal models 
(those where the signal has fewer than 2M degrees of freedom) will result in singular 
covariance matrices. Instead we adopt a multivariate normal distribution with an 
L x L covariance matrix Z for abstract signal parameters y related to h through a 
2M x L matrix W of template waveforms, so that h = Wy. 

A source with known waveforms w + and w x whose relative amplitudes and 
projections onto the detector's polarisation basis is unknown (as is the case for a 
source of unknown polarisation angle and inclination) could be modelled with four 
amplitude parameters, Z = I and 



W 



(6) 



w + w x 
w + w x 

This is a Bayesian analogue of a coherent matched filter analysis. 

The least informative multivariate normal distribution prior has 2M parameters 
(one for each sample of each polarisation), with Z = I and W = al where a is a scale 
factor. This prior says that we know nothing about hi (save that it is unlikely to be 
very much larger that a); we do not know if it is positive or negative or zero, and 
we do not know if it is correlated with hj positively or negatively or not at all. This 
distribution also describes white noise, and we will call it a 'white noise' signal prior, 
even though it detects all signals, not just those we would classify as 'noise-like'. 

With a signal prior defined we may now solve 

p(x| 'signal') = / dy p(y)p(x|h) 
Jy 

exp -i[(x - FWy) T A- 1 (x - FWy) + y T Z- 1 y] 



dy- 



y 



(2 7r )(MJV+L)/2 x / detAdetZ 
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= 1 ; exp-ix^C^x (8) 

where CT 1 = A" 1 - (A- 1 FW)[(FW) T A" 1 (FW) + Z" 1 ]" 1 (A^FW) 7 
This is just another multivariate normal distribution. 

The Bayesian odds ratio tells us how likely the 'signal' hypothesis is, given the 
observation x in terms of how likely the observation is, given the two hypotheses under 
consideration and the prior plausibility of those hypotheses: 
p('signal'|x) p('signal') p(x|'signal') 



O 



p('noise'|x) p('noise') p(xj'noise') 



= ^^^J^exp-ix^C- 1 - A^x. (9) 
p('noise') V detC 2 v ' w 

We should choose the prior odds ratio p( 'signal') jp{ 'noise') to be -C 1, reflecting the 

infrcquency of detectable gravitational waves. (This is analogous to choosing the 

threshold A for a frequentist likelihood.) 

This is not yet a gravitational wave search; it looks for any gravitational wave 

of a known size from a known direction at a known time. In i)4.2l we will see how 

to perform a Bayesian search. However, this odds ratio is the part of the Bayesian 

analysis that can be compared with previously proposed methods. 

3. Comparison with existing methods 

Several previously proposed tests have been expressed in the form 

x T Bx>A (10) 
for matrix B and threshold A. The Bayesian test is instead 

p('signal') /det A 1 T ,__i ._ h . 

. — - — — exp--x J (C -A i )x>l (11) 
p('noise') V det C v 2 v ' K ' 

or, rearranging Equation lll| 



x-(A--C->>-21n^^^ (12) 

p[ noise J V detC 

suggesting that by the appropriate choice of priors (including the signal model) we 
can create a Bayesian test that behaves like one of the previously proposed tests. 

3.1. Tikhonov regularised statistic 

Consider A = I, W = erl and Z = I (white noise detectors and white noise signal of 
characteristic amplitude a). Also let 



p('signal') 



e 



-A/2 



. , det(a2F^F + I). (13) 

p{ noise ) v 

As F varies with direction, so too does the prior. For these priors, the Bayesian test 
of Equation [12] becomes 

x T F(F T F + cr- 2 I)- 1 F T x > A (14) 
which is the Tikhonov regularised statistic of [6] with regulariser a = 

The priors show that the Tikhonov regularised statistic implicitly assumes that 
gravitational wave bursts have a particular size (this physical interpretation of the 
regulariser was not made in [5]) and are distributed non-uniformly on the sky. The 
strength of the directional bias depends on the size of the expected signal. 
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3.2. Giirsel-Tinto statistic 

If we take the limit a — ► oo (the large signal limit) Equation [Til limits to 

x T F(F T F)~ 1 F T x > A (15) 

which is the Giirsel-Tinto statistic [TJ. 

The priors show that the Giirsel-Tinto statistic implicitly assumes that gravita- 
tional wave bursts are very large, very frequent (note that p('signal')/p('noise') — > oo), 
and distributed non-uniformly on the sky. 

The effect of these priors is not so dramatic as we might expect; Giirsel-Tinto 
works in practice. Signals of physically reasonable sizes are only an infinitesimal part 
of this signal population, but signals from this population occur infinitely frequently, 
and thus the method is able to detect them. Even so, one of Gursel-Tinto's common 
failure modes is to misidentify a typical gravitational wave injection as a much larger 
(but, it believes, still likely to occur) gravitational wave with a different direction and 
polarisation that the network is nearly insensitive to. 



3.3. Soft constraint statistic 

Now allow the signal's characteristic amplitude to vary with direction, so that 
W = 0-fc(A)I, and divide the Bayesian test in Equation [T2l through by a 2 : 

-x T F(F T F + - ir X F T x > -- In P('^r)/p('no_isc^_ 

CT 2 XJ H* t + a 2 k 2 {Q) l ) X > a 2 ln v / det[(T 2 A: 2 (n)F T F + I] - 

Let 

p('signal') _ xa 2 /2 



w det[<7 2 fc 2 (n)F T F + Il. (16) 
p( noise ) v 

Then in the limit of a — > we have 

fc 2 (A)x T FF T x > A (17) 

which is the soft constraint statistic of [3 if the normalization function k~ 2 
is equal to the principal eigenvalue of F T F (though the normalization function 
k~ 2 (Q) = trF T F is required to give a direction-free null hypothesis distribution). 

The priors show that the soft constraint statistic implicitly assumes gravitational 
wave bursts are very small and very frequent (note that p( 'signal') /p('noise') — > 1). 
The Bayesian statistic can only draw very weak conclusions about the presence of 
very small signals (the division of Equation \T7\ by a 2 is necessary to 'blow up' its 
diminishing range), so even though the priors depend only weakly on direction they 
still strongly bias the soft constraint. 



3.4- Hard constraint statistic 



Now consider a linearly polarised signal model with a polarisation angle that is some 
known function of direction ip(Sl) and W = crfc(f2)U where 



U 



cos 2tp(Q)I 
sin2V>(A)I 



Let 



p('signal') 
p('noise') 



D -\a 2 /2 



det[cr 2 fc 2 (A)(FU) T FU + I] 



(18) 



(19) 
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Figure 1. Four statistics as a function of direction Q = (6,<j>) for an injection 
of 1/16 seconds of white noise with amplitude SNR 5 into 3 identical white 
noise detectors sampled at 1024 Hz with the locations and orientations of LIGO 
Hanford, LIGO Livingston and Virgo. Black is least plausible; white is most 
plausible; a circle indicates the true direction; a square indicates the maximum 
of the statistic. Top left: In Bayesian odds ratio with a = 5; the signal is 
localised with 95% confidence to 10 — 4 of the sky including the true direction. 
Top right: Tikhonov with a = 1/5; this differs from the Bayesian case only by 
the direction prior. Bottom left: Giirsel-Tinto; note the discontinuities where the 
network becomes insensitive to one polarisation. Bottom right: Soft constraint 
with k(Q)~ 2 = trF T F. For this injection, the global maximums of the non- 
Bayesian statistics are not consistent with the true direction. 

Then in the limit of a — > the Bayesian test Equation [12] divided by a 2 goes to 

fc 2 (Q)x T FU(FU) T x > A. (20) 

If the polarisation angle is defined as the one the network is most sensitive to, and the 
normalisation function /c~ 2 (f2) is taken to be the principal eigenvalue of (FU) T FU, 
this expression is the hard constraint statistic of [3J. 

As well as the explicit design assumptions that gravitational wave bursts are 
linearly polarised and optimally oriented, the priors show that the hard constraint 
statistic implicitly assumes that gravitational wave bursts are very small and very 
frequent (note that p('signal')/p('noise') — * 1), and (like the soft constraint) is strongly 
affected by the weak directional dependency of its priors. 

3. 5. Implications of implicit priors 

The unphysical priors required to make a Bayesian analysis behave like any of the 
previously proposed methods indicates that none of these methods are optimal for the 
realistic scenarios their designers were targeting; they are optimal for scenarios greatly 
different, where gravitational waves are infinitely large or small and not uniformly 
distributed on the sky. A Bayesian analysis whose priors better reflect the true signal 
population will necessarily outperform all of these methods (see Figure [IJ , but the 
analysis above does not indicate how much better it will do. 
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What went wrong in the design of these methods? In the case of Gursel-Tinto, the 
good idea of waveform independence was translated into an uninformative improper 
prior; but if an observation can be explained by either a moderately sized signal or 
an unphysically enormous signal, it is not optimal to treat these as equally credible 
alternatives. Ensuring that all directions have the same null hypothesis distribution 
counter-intuitively introduces a bias on the sky, because where the network is more 
sensitive the absence of excess power is actually stronger evidence against the signal 
hypothesis. The 'two detector paradox' described in [3] is simply an artefact of Giirsel- 
Tinto's uninformative prior. Of the ad hoc attempts to fix the 'paradox' described 
in [3 [6], only the Tikhonov regulariser constitutes an unambiguous theoretical 
improvement. 

We have not considered here all the methods that have been proposed in the 
literature. Some of these, notably the other regularisation method in [6] and the 
almost-Tikhonov-regularisation in [5], likely suffer from similar problems. 

4. Bayesian search 

4-1. Robustness 

All the methods discussed so far (including the Bayesian test) consider the only source 
of excess energy in interferometers to be a gravitational wave. Data orthogonal to 
spanF is ignored because the noise and signal hypothesis make identical predictions 
and cannot be distinguished by it, even though that data is the Gursel-Tinto null 
stream [I] and excess energy in it rejects the gravitational wave hypothesis. [2] 
supplemented Gursel-Tinto with an ad hoc statistic in an attempt to solve this 
shortcoming. 

A better approach (used in [8]) is to include interferometer 'glitches' in the set 
of hypotheses under consideration; they are just another kind of signal model. A 
glitch can be represented as a different noise covariance distribution replacing A. For 
example, if white noise glitches occur in white detectors with plausibility p('glitch'), are 
additive and have characteristic amplitude 7, then we have a family of 2 N covariance 
distributions such as A01...0 = diag[I, (1 + 7 2 )I, . . . , I] with prior plausibilities of the 
form p('glitch') fc [l — p('glitch')] 2JV_fc . The AT-glitch hypothesis can 'explain' any data, 
but it incurs a large Occam penalty for its generality, whereas the signal hypothesis can 
more parsimoniously explain the tiny subspace of all possible data that is consistent 
with a gravitational wave. Signals from directions (or with a polarisation) that one 
detector is insensitive to are forced to compete against the less penalised N — 1 glitch 
hypotheses If we have more knowledge about glitches, we can use a set of glitch 
waveforms and parameters analogously to the construction of our signal hypothesis. 

4-2. Marginalisation 

To perform a search over parameters the model is not linear in, like time-of-arrival r, 
sky direction £1 and signal model types, we need to numerically marginalise over these 
variables. For example, we can produce the odds ratio for a signal from any direction 
by marginalising over fi: 
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Previously proposed methods would instead maximise their statistic over their 
parameters, so we cannot directly compare this step, other than to note that for 
confidently detected signals the marginalisation will be dominated by a single peak 
and the integral will be proportional to the maximum of the integrand. 

Signals from sources distributed uniformly in space will have a power-law 
distribution of characteristic amplitudes p(o~) oc cr~ 4 (this encodes the 'twice the 
sensitivity, eight times the event rate' rule of thumb). This provides us with a simple 
example of a parameterised set of signal models W = erWi and a physically-inspired 
prior on the parameter. We may also want to look for signals parameterised in other 
ways. For example, we may want to consider different time and frequency bounds on 
time- and/or frequency-band- limited signals. 

4-3. Implementation 

If the noise and signal models are stationary, the Bayesian statistic for a particular 
signal size can be implemented (using a transformation to the Fourier domain that 
diagonalises C) as efficiently as the coloured noise versions of the previously proposed 
statistics considered here. If we declare the glitch and signal hypotheses exclusive 
(so that we will reject the very infrequent gravitational wave that occurs at the 
same time as a glitch) and model glitches in each detector as independent, we incur 
only a minimal extra cost to gain robustness against that population of instrumental 
artefacts. Marginalising over a can reuse many intermediate values and requires 
relatively few samples for a steep er -4 prior. Many other signal models present 
comparable opportunities for optimisation. 

5. Conclusion 

A simple Bayesian analysis of the problem of coherent detection of unmodelled 
gravitational wave bursts with a network of ground-based interferometric gravitational 
wave detectors is presented. It reduces or limits to several previously proposed 
methods for the same problem only when unphysical priors are used, indicating that 
those previously proposed methods cannot be optimal for realistic signal populations. 
A method for improving the robustness of the Bayesian method by substituting a 
more realistic noise model was noted, as were the steps for generalising to an all-sky 
unknown-arrival-time search. The Bayesian method can be implemented, depending 
on the signal model, to be computationally competitive with existing methods. 

We can contrast the process of specifying physical models of the world, quantifying 
their predictions and forming the Bayesian statistic in $3] with the looser process 
of heuristically arguing our way to a statistic [I] and then noting its flaws and 
heuristically arguing our way to a mutated statistic [21 El [6] . In both cases, designers 
have freedom and subjective choices are made. In the Bayesian case, these are 
restricted to choosing the physical model and its priors; these have immediate physical 
interpretations which are explicit and can be contested. The Bayesian statistic follows 
uniquely from them. In the non-Bayesian case choices are made at every stage and 
their implications are uncertain, leaving us open to unexamined, unintended and 
unfortunate consequences. 
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